Feature learning

In machine learning, feature learning or representation learning^[2] is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task.

Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process. However, real-world data such as images, video, and sensor data have not yielded to attempts to algorithmically define specific features. An alternative is to discover such features or representations through examination, without relying on explicit algorithms.

Feature learning can be either supervised, unsupervised or self-supervised.

In supervised feature learning, features are learned using labeled input data. Labeled data includes input-label pairs where the input is given to the model and it must produce the ground truth label as the correct answer.^[3] This can be leveraged to generate feature representations with the model which result in high label prediction accuracy. Examples include supervised neural networks, multilayer perceptron and (supervised) dictionary learning.
In unsupervised feature learning, features are learned with unlabeled input data by analyzing the relationship between points in the dataset.^[4] Examples include dictionary learning, independent component analysis, matrix factorization^[5] and various forms of clustering.^[6]^[7]^[8]
In self-supervised feature learning, features are learned using unlabeled data like unsupervised learning, however input-label pairs are constructed from each data point, which enables learning the structure of the data through supervised methods such as gradient descent.^[9] Classical examples include word embeddings and autoencoders.^[10]^[11] SSL has since been applied to many modalities through the use of deep neural network architectures such as CNNs and transformers.^[9]

^ Goodfellow, Ian (2016). Deep learning. Yoshua Bengio, Aaron Courville. Cambridge, Massachusetts. pp. 524–534. ISBN 0-262-03561-8. OCLC 955778308.
^ Y. Bengio; A. Courville; P. Vincent (2013). "Representation Learning: A Review and New Perspectives". IEEE Transactions on Pattern Analysis and Machine Intelligence. 35 (8): 1798–1828. arXiv:1206.5538. doi:10.1109/tpami.2013.50. PMID 23787338. S2CID 393948.
^ Stuart J. Russell, Peter Norvig (2010) Artificial Intelligence: A Modern Approach, Third Edition, Prentice Hall ISBN 978-0-13-604259-4.
^ Hinton, Geoffrey; Sejnowski, Terrence (1999). Unsupervised Learning: Foundations of Neural Computation. MIT Press. ISBN 978-0-262-58168-4.
^ Nathan Srebro; Jason D. M. Rennie; Tommi S. Jaakkola (2004). Maximum-Margin Matrix Factorization. NIPS.
^ Cite error: The named reference coates2011 was invoked but never defined (see the help page).
^ Csurka, Gabriella; Dance, Christopher C.; Fan, Lixin; Willamowski, Jutta; Bray, Cédric (2004). Visual categorization with bags of keypoints (PDF). ECCV Workshop on Statistical Learning in Computer Vision.
^ Daniel Jurafsky; James H. Martin (2009). Speech and Language Processing. Pearson Education International. pp. 145–146.
^ ^a ^b Ericsson, Linus; Gouk, Henry; Loy, Chen Change; Hospedales, Timothy M. (May 2022). "Self-Supervised Representation Learning: Introduction, advances, and challenges". IEEE Signal Processing Magazine. 39 (3): 42–62. arXiv:2110.09327. Bibcode:2022ISPM...39c..42E. doi:10.1109/MSP.2021.3134634. ISSN 1558-0792. S2CID 239017006.
^ Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg S; Dean, Jeff (2013). "Distributed Representations of Words and Phrases and their Compositionality". Advances in Neural Information Processing Systems. 26. Curran Associates, Inc. arXiv:1310.4546.
^ Goodfellow, Ian (2016). Deep learning. Yoshua Bengio, Aaron Courville. Cambridge, Massachusetts. pp. 499–516. ISBN 0-262-03561-8. OCLC 955778308.

[1] Goodfellow, Ian (2016). Deep learning. Yoshua Bengio, Aaron Courville. Cambridge, Massachusetts. pp. 524–534. ISBN 0-262-03561-8. OCLC 955778308.

[pami-2] Y. Bengio; A. Courville; P. Vincent (2013). "Representation Learning: A Review and New Perspectives". IEEE Transactions on Pattern Analysis and Machine Intelligence. 35 (8): 1798–1828. arXiv:1206.5538. doi:10.1109/tpami.2013.50. PMID 23787338. S2CID 393948.

[3] Stuart J. Russell, Peter Norvig (2010) Artificial Intelligence: A Modern Approach, Third Edition, Prentice Hall ISBN 978-0-13-604259-4.

[4] Hinton, Geoffrey; Sejnowski, Terrence (1999). Unsupervised Learning: Foundations of Neural Computation. MIT Press. ISBN 978-0-262-58168-4.

[5] Nathan Srebro; Jason D. M. Rennie; Tommi S. Jaakkola (2004). Maximum-Margin Matrix Factorization. NIPS.

[coates2011-6] Cite error: The named reference coates2011 was invoked but never defined (see the help page).

[7] Csurka, Gabriella; Dance, Christopher C.; Fan, Lixin; Willamowski, Jutta; Bray, Cédric (2004). Visual categorization with bags of keypoints (PDF). ECCV Workshop on Statistical Learning in Computer Vision.

[jurafsky-8] Daniel Jurafsky; James H. Martin (2009). Speech and Language Processing. Pearson Education International. pp. 145–146.

[:0-9] Ericsson, Linus; Gouk, Henry; Loy, Chen Change; Hospedales, Timothy M. (May 2022). "Self-Supervised Representation Learning: Introduction, advances, and challenges". IEEE Signal Processing Magazine. 39 (3): 42–62. arXiv:2110.09327. Bibcode:2022ISPM...39c..42E. doi:10.1109/MSP.2021.3134634. ISSN 1558-0792. S2CID 239017006.

[:3-10] Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg S; Dean, Jeff (2013). "Distributed Representations of Words and Phrases and their Compositionality". Advances in Neural Information Processing Systems. 26. Curran Associates, Inc. arXiv:1310.4546.

[:1-11] Goodfellow, Ian (2016). Deep learning. Yoshua Bengio, Aaron Courville. Cambridge, Massachusetts. pp. 499–516. ISBN 0-262-03561-8. OCLC 955778308.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]